Aims

The main objective of this systematic literature survey was to identify gaps and trends in a representative sample of published studies using deep-learning algorithms to analyse image data for animals species/individual/behaviour recognition and/or classification. To gain insights on recent developments aresented in academic literature, we focused on the journal articles and full-text conference proceedings published in the last 5 years (2017-2021).

Inclusion criteria at the title and abstract screening phase

Following PICO framework, we included articles if all criteria below were fulfilled:

  • Population: wild or semi-wild vertebrate species (exclude domestic or farmed animals, invertebrates, museum specimens).

  • Intervention / Innovation: use of computer vision machine learning algorithms (include neural-network type methods, such as deep learning, CNN), support vector, random forest) for automated or semi-automated processing of image data (e.g. from camera traps, video tracking, thermal imaging) at a scale where individual animals are visible (include aerial and drone images (exclude images gathered from satellites, biologing, X-ray, MRI images or equivalent).

  • Comparator / Context: images taken in the wild or semi-wild (includes zoo enclosures, excludes lab-based or agricultural/aquaculture/pet studies).

  • Outcomes: analyses focus on animal / species individual recognition/classification or animal behaviour recognition/classification.

  • Additional criteria: studies published in last 5 years (2017-2021), peer-reviewed (including full-text conference proceedings).

Abstract screening procedure and results

We used Rayyan QCRI software to screen 2,259 unique bibliographic records downloaded from Scopus. Two researchers (ML, JT) independently performed the screening assessing titles abstracts and keywords of each article. This screening resulted in 225 articles included for full-text assessment and data extraction.

Inclusion criteria at full-text screening

  • Full text available
  • Full-text studies should fulfill the same criteria as defined for the title and abstract screening phase

Full text screening and data extraction

Out of the 225 papers included, we obtained full-text for 215 papers.
For data extraction we used a two-part custom questionnaire implemented as a Google Form ( https://forms.gle/N7Hn9DVRjjmoKRd58). To pilot the form, we randomly selected 14 papers for independent screening aand extraction by three researchers (ML, JT, RF). We resolved disagreements by discussion until consensus was reached, and we refined the questionnaire form before the main round of full-text screening and data extraction.
One researcher (ML) performed full-text screening and data extraction for the remaining 195 papers. Second researcher (RF) cross-checked 58 of these papers for accuracy and to potentially resolve cases where information provided in the papers was unclear. After the full-text assessment, we extracted data from 192 studies.

Table S1 - full-text assessment and data extraction form

Question Answer options
Paper’s title: [text]
First author’s family name: [text]
Publication year: [number]
Journal name: [text]
Article doi: [text]
C1. Peer-reviewed empirical study [yes; no; unsure/other]
Comment for C1
C2. Is full text available in English? [yes; no; unsure/other]
Comment for C2
C3. Population: wild or semi-wild vertebrate species? [yes; no; unsure/other]
Comment for C3
C4. Intervention / Innovation: use of computer vision machine learning algorithms (for automated or semi-automated processing of image data at a scale where individual animals are visible)?: [yes; no; unsure/other]
Comment for C4
C5. Comparator / Context: are the studied animals in the wild or semi-wild? comment for C5 [yes; no; unsure/other]
C6. Outcomes: focus on animal / species individual recognition / classification or animal behaviour recognition / classification ?: [yes; no; unsure/other]
Comment for C6
Q1. Number of studied species [number]
Comment for Q1
Q2. Study species (Latin name) [text]
Comment for Q2
Q3. Studied species group: [mammals; birds; reptiles; amphibians; fishes; other/unclear]*
Comment for Q3
Q4. Used image type source: [camera trap or surveillance camera (fixed); aerial (including drone); hand camera (or mobile phone camera); other/unclear]*
Comment for Q4
Q5. Study context or setting: [wild; semi-wild; unclear/other]*
Comment for Q5
Q6. Location country/region: [text]
Q7. Location details: [text]
Q8. Algorithm type: [Neural Network; Random forest; Gradient boosting model; Support Vector Machines; Rule-based learners; Decision trees; K-Nearest Neighbour; unclear/other]*
Q9. Outcome type: [counting individuals (at given time); individual recognition (re-identification); species recognition/classification (class/object detection); behaviour detection (at given time); tracking (following through space); behaviour classification (changes over time); unclear/other]*
Q10. Analysis code [yes; no; unclear/other]

Note: * indicates plural variables (i.e. more than one answer option can be chosen).

Each question in the data extraction form (Table S1) is followed by a dedicated comment field used to record any additional details, including relevant quotes from the paper. We excluded any papers that were coded as “no” at questions C1 to C6 (full-text screening questions - whether the paper fulfills our inclusion criteria), i.e. these papers were not subject to any further data extraction and analyses.

After data extraction additional columns were added to the data table with the following data:
- Q7_coordinates: latitude and longitude of the study location, as in the paper or from Google Maps, if not reported
- Q7_location_unclear: 0 = clear (location at least at the level of national park, state, province, city, or equivalent - reported in the article or inferred from the data set name); 1 = unclear, location either not reported or cannot be assigned to a specific location (e.g., global data, broad regions such as Arctic, Northern Atlantic, Africa, America)
- Checked: whether record was cross-checked by an indpendent researcher
- Checking_comments: any comments from data extraction checking
- Changed: whether record was changed after cross-checking
- Changed_comment: how record was changed after cross-checking
- Pilot: whether study was used in the piloting phase
- Included: whether study was included in the final data set for extraction
- Exclusion reason: main reason for excluding study from the final data set for extraction, if excluded

Screening results

Out of the 215 full-text articles screened, 192 were deemed eligible for data extraction (Table S2). The data extraction spreadsheet is stored as mapping_dataset_reconciled.xlsx. Below, we present a summary of the extracted data.

Summary tables

Table S2

List of articles excluded at full-text screening, with main reasons for exclusion.

Table S3

List of included articles with key bibliographic information.

Preprocessing extracted data

Data cleaning before generating summaries and plotting.

Summaries and plots

Publication year

Top publication journals

A barplot of the counts of publications in different journals, withjournals with >2 included papersshown sorted by descending frequency order.

To infer discipline / audience type we categorized publication journals as: computer science / technology, ecology, multidisciplinary.

A barplot of the counts of publications in different journals, withjournals with >2 included papersshown sorted by descending frequency order, now with journals colored by discipline.

Number of species / animal classes used

Most data sets have prespecified number of animal species / classes present. Class can represent a species or a higher taxonomic group, such as genus, family, order, super-order, etc. (even “animals” can ba a class). Classes of non-animal objects (e.g. humans, vehicles) were not counted. When more than one dataset was used, the number was extracted for the biggest dataset.

A brief summary statistics on the number of species.

Table S3

Papers with > 100 species/animal classes.

A histogram with numbers of classes on a log x-scale due to strong right-skew in the data.

As a barplot displaying actual values of the numbers of species / classes.

Study species in one-animal studies

For studies focusing on a single animal species, we extracted species name to investigate which particular species were most popular (subspecies names are omitted from the plot labels).

Plot with color coding biological groups.

Types of animals used

Most popular types of animals as represented by commonly used “biological” categories. One study could be coded as studging one or more categories of animals, e.g. both mammals and birds. However, distribution of number of species within multi-category studies was often not even, e.g. a commonly used Serengeti dataset from tanzania is dominated by large mammals with only a few species of large birds considered in analyses.

Types of image sources used

Image sources were categorized by the type of the hardware used to collect image data: fixed survelliance/trap cameras (often activated by movement, or continuously recording), hand-held devices including mobile phones, or device mounted on aerial vehicles including drones. Where it was not clearly reported in a paper, we inferred the image source from the example images from the analysed dataset or from descriptions of the dataset in other publications. A single study could be coded as using one or more categories of image sources, e.g. mix of camera traps and hand-held cameras.

Types of settings for animal images

Settings of teh images used were classified as wild or semi-wold (outdoor enclusures for wild animals). A single study could be coded as using one or more categories of settings, e.g. mix of images from te wild and captive animals.

Location country

Country or a larger region where animal images were collected. A single study could be coded as using images from one or more countries/regions. Some studies using images of captive animals kept in zoos likely across mutiple countries were coded as “global” (often images sourced from the Internet/social platforms).

A barplot of the counts of articles originating form a given country / larger region. “Global” are usually datasets based on images collected from the Internet or social media.

A choropleth map of the counts of articles based on a dataset originating form a given country. Data gathered from larger than country regions (e.g. oceans, continents, global) are not shown.

Location coordinates

Location coordinates represent either a specific location (green circles) or centroids of a broader region (orange circles) animal images originated from. Darker circles indicate a larger number of studies using images from a given location. Global image datasets (e.g. gathered from the Internet or social media) are not shown.

Types of machine learning algorithm for analysing animal images

Barplot of the main types of machine learning algorithms used. A single study could be coded as using one or more types.

Types of outcomes from analysing animal images

Barplot of the main types of outcomes / purposes of analyses used. A single study could be coded as using one or more types.

Whether analysis code is available

Barplot of the analysis code availability. Code was coded as available when a link to a code repository was provided in the article.

Cross-tabulating two factors

A set of draft plots using information from two or more extracted variables. To be refined.

Algorithm vs studied outcome heat map

A heatmap showing crosstabulation of the main types of machine learning algorithms and analysis outcomes / purposes. A single study could be coded as using one or more types for both variables.

Year vs studied outcome heat map

A heatmap showing crosstabulation of the study publication year and analysis outcomes / purposes. A single study could be coded as using one or more types for the analysis outcomes / purposes.

Year vs studied outcome stacked area chart

A stacked area chart showing crosstabulation of the study publication year and analysis outcomes / purposes. A single study could be coded as using one or more types for the analysis outcomes / purposes.

Year vs algorithm type heat map

A heatmap showing crosstabulation of the study publication year and the main types of machine learning algorithms. A single study could be coded as using one or more types for the types of machine learning algorithms.

Year vs algorithm type stacked area chart

A stacked area plot showing crosstabulation of the study publication year and the main types of machine learning algorithms. A single study could be coded as using one or more types for the types of machine learning algorithms.

Year vs algorithm type proportional stacked area chart

A stacked area plot showing yearly changes in proportions of the main types of machine learning algorithms used. A single study could be coded as using one or more types for the types of machine learning algorithms.

Year vs journal discipline stacked area chart

A stacked area plot showing yearly changes in the counts of publications by journal discipline.

Year vs image source type stacked area chart

A stacked area plot showing yearly changes in the types of images used for analyses.

Individual recognition by species

A barplot of species names for studies focusing on re-identification of individuals. Species names were extracted only from papers focusing on a single species (i.e. data not shown for 12 multi-species studies).

Phylogenetic tree of species used in studies focusing on re-identification of individuals. Species names were extracted only from papers focusing on a single species (i.e. data not shown for 12 multi-species studies). Using rotl R package (https://peerj.com/preprints/1471/) allowing access to synthetic phylogenetic tree available at the Open Tree of Life database (https://opentreeoflife.org/).

## 
Progress [---------------------------------] 0/136 (  0) ?s
Progress [==============================] 136/136 (100)  0s
                                                            

Overlay organism silhouettes

Note: phylopic.org hosts free silhouette images of animals, plants, and other life forms, all under Creative Commons or Public Domain. Also using colours to indicate biological groups across the plot.

## quartz_off_screen 
##                 2

Note: Plotting to pdf causes some of the animal silhouettes to have distorted height:width ratios - these will be fixed manually in Adobe Illustrator (saved as Figure2_tree_Ai.pdf)

Reporting quality

Table of counts of studies with unclear or missing data for key extracted variables (as applicable).

A stacked barplot of counts of studies with unclear and missing data for key extracted variables (as applicable).

Bibliometric analyses

These analyses are based on the information extracted from bibliographic records downloaded from Scopus. Initial preprocessing and summaries using bibliometrix R package. Subsequently this data was combined with manually coded data from the full texts.

First author country

Load and export author affiliation country from bibliographic records (scopus_AI_1and2.bib).

## 
## Converting your wos collection into a bibliographic dataframe
## 
## 
## Warning:
## In your file, some mandatory metadata are missing. Bibliometrix functions may not work properly!
## 
## Please, take a look at the vignettes:
## - 'Data Importing and Converting' (https://www.bibliometrix.org/vignettes/Data-Importing-and-Converting.html)
## - 'A brief introduction to bibliometrix' (https://www.bibliometrix.org/vignettes/Introduction_to_bibliometrix.html)
## 
## 
## Missing fields:  ID CR 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

Initial data cleaning and merging with manually coded data frame. Standard bibliometric data summary.

## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              2017 : 2021 
##  Sources (Journals, Books, etc)        127 
##  Documents                             192 
##  Average years from publication        2.53 
##  Average citations per documents       0 
##  Average citations per year per doc    0 
##  References                            1 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    0 
##  Author's Keywords (DE)                1064 
##  
## AUTHORS
##  Authors                               797 
##  Author Appearances                    919 
##  Authors of single-authored documents  0 
##  Authors of multi-authored documents   797 
##  
## AUTHORS COLLABORATION
##  Single-authored documents             0 
##  Documents per Author                  0.241 
##  Authors per Document                  4.15 
##  Co-Authors per Documents              4.79 
##  Collaboration Index                   4.15 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     2017       19
##     2018       20
##     2019       47
##     2020       63
##     2021       43
## 
## Annual Percentage Growth Rate 22.65315 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1  ANAND S               4   HAMILTON G                     1.200
## 2  CHEN P                4   ANAND S                        1.025
## 3  CLUNE J               4   DLAMINI N                      1.000
## 4  FALZON G              4   FAVORSKAYA M                   1.000
## 5  HAMILTON G            4   VAN ZYL TL                     1.000
## 6  NOROUZZADEH MS        4   SCHNEIDER S                    0.917
## 7  BOWLEY C              3   TAYLOR GW                      0.917
## 8  CHAUMONT M            3   FALZON G                       0.883
## 9  CORCORAN E            3   CORCORAN E                     0.867
## 10 DENMAN S              3   DENMAN S                       0.867
## 
## 
## Top manuscripts per citations
## 
##            Paper                                        DOI TC TCperYear NTC
## 1  AFÁN I, 2018,          10.3390/drones2040042              0         0 NaN
## 2  AKÇAY HG, 2020,        10.3390/ani10071207                0         0 NaN
## 3  ALLKEN V, 2021,        10.1002/gdj3.114                   0         0 NaN
## 4  ALQARALLEH BAY, 2020,  10.1109/ACCESS.2020.3039695        0         0 NaN
## 5  AMIR A, 2017,          10.1007/978-3-319-48517-1_5        0         0 NaN
## 6  ARSHAD B, 2020,        10.1109/SENSORS47125.2020.9278802  0         0 NaN
## 7  ATANBORI J, 2018,      10.1016/j.ecoinf.2018.07.005       0         0 NaN
## 8  BAIN M, 2019,          10.1109/ICCVW.2019.00032           0         0 NaN
## 9  BANUPRIYA N, 2020,     10.31838/jcr.07.01.85              0         0 NaN
## 10 BEERY S, 2018,         10.1007/978-3-030-01270-0_28       0         0 NaN
## 
## 
## Corresponding Author's Countries
## 
##         Country Articles   Freq SCP MCP MCP_Ratio
## 1  CHINA              26 0.1503  25   1    0.0385
## 2  USA                23 0.1329  23   0    0.0000
## 3  AUSTRALIA          22 0.1272  22   0    0.0000
## 4  INDIA              21 0.1214  21   0    0.0000
## 5  CANADA              9 0.0520   9   0    0.0000
## 6  GERMANY             7 0.0405   7   0    0.0000
## 7  SOUTH AFRICA        6 0.0347   6   0    0.0000
## 8  INDONESIA           5 0.0289   5   0    0.0000
## 9  ECUADOR             4 0.0231   4   0    0.0000
## 10 FRANCE              4 0.0231   4   0    0.0000
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  ARGENTINA                    0                         0
## 2  AUSTRALIA                    0                         0
## 3  BAHRAIN                      0                         0
## 4  BANGLADESH                   0                         0
## 5  BELARUS                      0                         0
## 6  BELGIUM                      0                         0
## 7  BRAZIL                       0                         0
## 8  CAMEROON                     0                         0
## 9  CANADA                       0                         0
## 10 CHINA                        0                         0
## 
## 
## Most Relevant Sources
## 
##                                                                                                                          Sources       
## 1  ECOLOGICAL INFORMATICS                                                                                                              
## 2  LECTURE NOTES IN COMPUTER SCIENCE (INCLUDING SUBSERIES LECTURE NOTES IN ARTIFICIAL INTELLIGENCE AND LECTURE NOTES IN BIOINFORMATICS)
## 3  METHODS IN ECOLOGY AND EVOLUTION                                                                                                    
## 4  ECOLOGY AND EVOLUTION                                                                                                               
## 5  COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE                                                                                  
## 6  PROCEEDINGS - 2019 INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP ICCVW 2019                                                  
## 7  ANIMALS                                                                                                                             
## 8  ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING                                                                                       
## 9  REMOTE SENSING                                                                                                                      
## 10 CEUR WORKSHOP PROCEEDINGS                                                                                                           
##    Articles
## 1        11
## 2        10
## 3        10
## 4         8
## 5         5
## 6         5
## 7         4
## 8         3
## 9         3
## 10        2

A barplot of country assigned to each publication based on the affiliation country of the first author. Co-authorship type is based on country of all authors of a given publication. SCP indicates all authors were affiliated with the same country. MCP indicates international co-authorship.

Country publication counts and co-authorship types

A choropleth map of the counts of articles with their first author affiliated with a given country.

A choropleth map of the counts of articles with their first author affiliated with a given country (using different colour scheme), with image collection locations presented as points. Locations represent either a specific study site (green circles) or centroids of a broader region (orange circles) animal images originated from. Darker circles indicate a larger number of studies using images from a given location. Global image datasets (e.g. gathered from the Internet or social media) are not shown.

An alluvial plot representing overlaps of countries of affiliation of study first author and countries / regions where image data originated from. Data in global and unclear locations are included in the plot.

An choropleth map showing overlaps of countries of affiliation of study first author and locations where image data originated from. Data in global and unclear locations are not included in the plot. Arrows link locations of images to locations of ex situ authors (i.e. authors working on image datasets from a different country / region) and bubbles represent in situ authors (i.e. authors working on image datasets from the same country / region), scaled proportionally to article counts. Multi-country / global image datasets (e.g. gathered from the Internet or social media) are not shown.

DONE since previous

  • classify journals into comp.sci vs. ecology journals - DONE
  • plot for year vs. algorithm type - DONE
  • plot for year vs. outcome type - DONE
  • individual recognition for which species? - DONE
  • reporting quality - DONE
  • bibliometric analyses: affiliation country distribution (first author, all authors, within-study diversity) and overlap with study location - DONE